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We study the problem of estimating the coefficients of a diffu¬ 
sion {Xt,t > 0); the estimation is based on discrete data XnA,n = 

0,1,..., N. The sampling frequency A~^ is constant, and asymptotics 
are taken as the number N of observations tends to infinity. We prove 
that the problem of estimating both the diffusion coefficient (the 
volatility) and the drift in a nonparametric setting is ill-posed: the 
minimax rates of convergence for Sobolev constraints and squared- 
error loss coincide with that of a, respectively, first- and second-order 
linear inverse problem. To ensure ergodicity and limit technical dif¬ 
ficulties we restrict ourselves to scalar diffusions living on a compact 
interval with reflecting boundary conditions. 

Our approach is based on the spectral analysis of the associated 
Markov semigroup. A rate-optimal estimation of the coefficients is ob¬ 
tained via the nonparametric estimation of an eigenvalue-eigenfunction 
pair of the transition operator of the discrete time Markov chain 
{XnA , n = 0,1,..., N) in a suitable Sobolev norm, together with an 
estimation of its invariant density. 

1. Introduction. 

1.1. Overview. Since Feller’s celebrated classification, stationary scalar 
diffusions have served as a representative model for homogeneous Markov 
processes in continuous time. Historically, diffusion processes were probably 
first seen as approximation models for discrete Markov chains, up to an 
appropriate rescaling in time and space. More recently, the development 
of financial mathematics has argued in favor of genuinely continuous time 
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models, with simple dynamics governed by a local mean (drift) b(-) and local 
variance (diffusion coefficient, or volatility) cr(-) on the state space 5 = M 
or S' C M with appropriate boundary condition. The dynamics are usually 
described by an ltd-type stochastic differential equation in the interior of S, 
which in the time-homogeneous case reads like 

dXt = b{Xt)dt + a{Xt)dWt, t>0, 

where the driving process {Wt,t>0) is standard Brownian motion. The 
growing importance of diffusion models progressively raised among the com¬ 
munity of statisticians a vast research program, from both quantitative and 
theoretical angles. We outline the main achievements of this program in 
Section 1.2. 

In the late 1970s a statistician was able to characterize qualitatively the 
properties of a parametric ergodic diffusion model based on the continuous 
observation of a sample path 

X^ :={Xt,0<t<T) 

of the trajectory, as T —> oo, that is, as the time length of the experiment 
grows to infinity, a necessary assumption to assure the growing of infor¬ 
mation thanks to the recurrence of the sample path. The 1980s explored 
various discretization schemes of the continuous time model: the data X"^ 
could progressively be replaced by the more realistic observation 

X{N,An) :={XnA^,n = 0,l,...,N), 

with asymptotics taken as —> oo. The discretization techniques used at 
that time required the high frequency sampling assumption A^r —> 0 whereas 
AA^r —> oo in order to guarantee the closeness of and X^, with 

T = AAat. Soon, a similar nonparametric program was achieved for both 
continuous time and high frequency data. 

By the early to mid-1990s, the frontier remained the “fixed A case,” that 
is, the case of low frequency data. This is the topic of the present paper. First, 
one must understand the importance and flexibility gained by being able to 
relax the assumption that the sampling time A between two data points is 
“small”: indeed, one can hardly deny that, in practice, it may well happen 
that sampling with arbitrarily small A is simply not feasible. Put differently, 
the asymptotic statistical theory is a mathematical construct to assess the 
quality of an estimator based on discrete observations and it must be decided 
which asymptotics are adequate for the data at hand. Second, the statistical 
nature of the problem drastically changes when passing from high to low fre¬ 
quency sampling: the approximation properties of the sample path X^^^ by 
XiXA^) are not valid anymore; the observation (Aq, Aa, ..., A^ta) becomes 
a genuine Markov chain, and inference about the underlying coefficients of 
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the diffusion process must be sought via the identification of the law of 
the observation In the time-homogeneous case the mathematical 

properties of the random vector are embodied in the transition 

operator 

PAfix) :=E[/(Xa)|Xo = x], 

dehned on appropriate test functions /. Under suitable assumptions, the 
operator Pa is associated with a Feller semigroup {Pt,t > 0) with a densely 
dehned inhnitesimal generator L on the space of continuous functions given 
by 

Lf{x) = L^,bf{x) ■■= ^-^f{x) + b{x)f'{x). 

The second-order term (7(-) is the diffusion coefficient, and the hrst-order 
term 5(-) is the drift coefficient. Postulating the existence of an invariant 
density //(•) = the operator L is unbounded, but self-adjoint neg¬ 
ative on ;= {/|/|/pAi < oo}, and the functional calculus gives the 

correspondence 

(1.1) PA=exp(AL) 

in the operator sense. Therefore, a consistent statistical program can be 
presumed to start from the observed Markov chain estimate its 

transition operator Pa and infer about the pair (6 (-),(t(-)), via the corre¬ 
spondence (1.1), in other words via the spectral properties of the operator 
Pa- Expressed in a diagram, we obtain the following line: 

(1.2) data = Pa L < —> (b{-),a(-)) = parameter. 

The efficiency of a given statistical estimation procedure will be measured by 
the prohciency in combining the estimation part {E) and the identihcation 
part (!) of the model. 

The works of Hansen, Scheinkman and Touzi (1998) and Chen, Hansen and Scheinkman 
(1997) paved the way: they formulated a precise and thorough program, 
proposing and discussing several methods for identifying scalar diffusions 
via their spectral properties. Simultaneously, the Danish school, given on 
impulse by the works of Kessler and Sprensen (1999), systematically stud¬ 
ied the parametric efficiency of spectral methods in the fixed A setting 
described above. By constructing estimating functions based on eigenfunc¬ 
tions of the operator L, they could construct \/]V-consistent estimators and 
obtained precise asymptotic properties. 

However, a quantitative study of nonparametric estimation in the fixed 
A context remained out of reach for some time, for both technical and 
conceptual reasons. The purpose of the present paper is to fill in this gap, by 
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trying to understand and explain why the nonparametric case significantly 
differs from its parametric analogue, as well as from the high frequency data 
framework in nonparametrics. 

We are going to establish minimax rates of convergence over various 
smoothness classes, characterizing upper and lower bounds for estimating 
b{-) and cj(-) based on the obervation of Xq, X^, ..., XjvA; with asymptotics 
taken as —> oo. The minimax rate of convergence is an index of both ac¬ 
curacy of estimation and complexity of the model. We will show that in the 
nonparametric case the complexity of the problems of estimating b{-) and 
cr(-) is related to ill-posed inverse problems. Although we mainly focus on 
the theoretical aspects of the statistical model, the estimators we propose 
are based on feasible nonparametric smoothing methods: they can be imple¬ 
mented in practice, allowing for adaptivity and finite sample optimization. 
Some simulation results were performed by Reifi (2003). 

The estimation problem is exactly formulated in Section 2, where also the 
main theoretical results are stated. The spectral estimation method we adopt 
is explained in Section 3, which includes a discussion of related problems and 
possible extensions. The proofs of the upper bound for our estimator and 
its optimality in a minimax sense are given in Sections 4 and 5, respectively. 
Results of rather technical nature are deferred to Section 6. 

1.2. Statistical estimation for diffusions: an outlook. We give a brief and 
selective summary of the evolution of the area over the last two decades. The 
nonparametric identification of diffusion processes from continuous data was 
probably first addressed in the reference paper of Banon (1978). More precise 
estimation results can be listed as follows: 

1.2.1. Continuous or high frequency data: the parametric case. Estima¬ 
tion of a finite-dimensional parameter 9 from X'^ = {Xt,0 <t<T) with 
asymptotics as T —> oo when X is a diffusion of the form 

(1.3) dXt = be{Xt)dt + a{Xt)dWt 

is classical [Brown and Hewitt (1975) and Kutoyants (1975)]. Here {Wt,t> 
0) is a standard Wiener process. The diffusion coefficient is perfectly iden¬ 
tified from the data by means of the quadratic variation of X. By assuming 
the process X to be ergodic (positively recurrent), a sufficiently regular 
parametrization 9 i—> hgf) implies the local asymptotic normality (LAN) 
property for the underlying statistical model, therefore ensuring the y/T- 
consistency and efficiency of the ML-estimator [see Liptser and Shiryaev 
( 2001 )]. 

In the case of discrete data XnA.N,n = 0,1,..., N, with high frequency 
sampling oo, but long range observation NA]\f oo as X —> oo, 

various discretization schemes and estimating procedures had been proposed 
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[Yoshida (1992) and Kessler (1997)] until Gobet (2002) eventually proved 
the LAN property for ergodic diffusions of the form 

(1.4) dXt = he, {Xt) dt + ae, (Xt) dWt 

in a general setting, by means of the Malliavin calculus: under suitable reg¬ 
ularity conditions, the finite-dimensional parameter 0i in the drift term can 
be estimated with optimal rate \/NAn, whereas the finite-dimensional pa¬ 
rameter 02 in the diffusion coefficient is estimated with the optimal rate y/N. 

1.2.2. Continuous or high frequency data: the nonparametric case. A 
similar program was progressively obtained in nonparametrics: If the drift 
function b{-) is globally unknown in the model given by (1.3), but belongs to 
a Sobolev ball S{s,L) (of smoothness order s > 0 and radius L) over a given 
compact interval X, a certain kernel estimator 6 t(') achieves the following 
upper bound in X^(X) and in a root-mean-squared sense: 

sup 

b&S{s,L) ^ ‘ 

This already indicates a formal analogy with the model of nonparametric 
regression or “signal-|-white noise” where the same rate holds. (Here and in 
the sequel, the symbol < means “up to constants,” possibly depending on 
the parameters of the problem, but that are continuous in their arguments.) 
See Kutoyants (1984) for precise mathematical results. 

Similar extensions to the discrete case with high frequency data sampling 
for the model driven by (1.4) were given in Hoffmann (1999), where the 
rates {N for the drift function b{-) and for the dif¬ 

fusion coefficient a{-) have been obtained and proved to be optimal. See 
also the pioneering paper of Pham (1981). Methods of this kind have been 
successfully applied to financial data [Ait-Sahalia (1996), Stanton (1997), 
Chapman and Pearson (2000) and Fan and Zhang (2003)]. In particular, it 
is investigated whether the usual parametric model assumptions are com¬ 
patible with the data, and the use of nonparametric methods is advocated. 

1.2.3. From high to low frequency data. As soon as the sampling fre¬ 
quency is not large anymore, the problem of estimating a param¬ 

eter in the drift or diffusion coefficient becomes significantly more difficult: 
the trajectory properties that can be recovered from the data when A at is 
small are lost. In particular, there is no evident approximating scheme that 
can efficiently compute or mimic the continuous ML-estimator in parametric 
estimation. 

Likewise, the usual nonparametric kernel estimators, based on differenc¬ 
ing, do not provide consistent estimation of the drift b{-) or the diffusion 
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coefficient cr(-). As a concrete example, consider the standard Nadaraya- 
Watson estimator 6(x) of the drift b{x) in the point a; G M: 

_ (NA)-^ Khjx - XnA){X(^n+l}A " ^nA) 

N-^En=oKh{x-XnA) 

with a kernel function K{-) and Kh{x) ;= h~^K{h~^x) for h> 0. If we let 
N ^ oo and /i —> 0, then by the ratio ergodic theorem and by kernel prop¬ 
erties we obtain almost surely the limit 

rA 

E[A-\Xa - Xo)|Ao = x] = A-i / Ptb{x) dt. 

Jo 

Hence, this estimator is not consistent. It merely yields a highly blurred 
version of b{x), which of course tends to b{x) in the high frequency limit A —> 
0. Note that the transition operators Pt involved depend on the unknown 
functions 6(-) and cr(-) as a whole. The situation for estimators of cr(-) based 
on the approximation of the quadratic variation is even worse, because the 
drift b{-) enters directly into the limit expression. 

1.2.4. Spectral methods for parametric estimation. Kessler and Sprensen 
(1999) suggested the use of eigenvalues kq and eigenvectors ^pe{') of the 
parametrized infinitesimal generator 

Lef{x) = + b 0 {x)f{x), 

that is, such that Lgipe{x) = Kgipg{x). Indeed, since the pair {Kg,ipg) also 
satisfies 

PA^e{XnA) =IE[v90(A(„+i)a)|X„a] =eyip{KgA)Lpg{XnA), 

whenever it is easy to compute, the knowledge of a pair {Kg,ipg) can be 
translated into a set of conditional moment conditions to be used in esti¬ 
mating functions. With their method, Kessler and Sprensen can construct 
\/iV-consistent estimators that are nearly efficient. See also the paper of 
Hansen, Scheinkman and Touzi (1998) that we already mentioned. 

In a sense, in this idea also lies the essence of our method. However, 
the strategy of Kessler and Sprensen is not easily extendable to nonpara- 
metrics: there is no straightforward way to pass from a finite-dimensional 
parametrization of the generator Lg with explicit eigenpairs (Kg,ipg) to a full 
nonparametric space with satisfactory approximation properties. Besides, 
there would be no evident route to treat the variance of such speculative 
nonparametric estimators either, because the behavior of the parametric 
Fisher information matrix for a growing number of parameters is too com¬ 
plex to be easily controlled. We will see in Section 3 how to pass over these 
objections by estimating directly an eigenpair nonparametrically. 
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1.2.5. Prospectives. A quick summary yields Table 1 for optimal rates of 
convergence. 

Table 1 can be interpreted as follows: the difficulty of the estimation 
problem is increasing from top to bottom and from left to right. A blank 
line separates the continuous-high-frequency (HF) data domain from the 
low-frequency (LF) data domain. The breach for LF data opened by Kessler 
and Sprensen as well as by Hansen, Scheinkman and Touzi shows that y/N- 
consistent estimators usually exist in the parametric case. The remaining 
case are the rates of convergence for LF data in the nonparametric case uj\f 
for the drift 6(-) and vj\f for the diffusion coefficient (7(-), for which we are 
aiming. 

2. Main results. 

2.1. A diffusion model with boundary reflections. We shall restrict our¬ 
selves to reflecting diffusions on a one-dimensional interval to avoid highly 
nontrivial technical issues; see the discussion in Section 3.3. 

Choosing for convenience the interval [0,1], we suppose the following. 

Assumption 2.1. The function 6: [0,1] —> M is measurable and bounded, 
the function a: [0,1] —> (0, oo) is continuous and positive and the function 
u : [0,1] —> M satishes z/(0) = 1, u(l) = —1. 

We consider the stochastic differential equation 

dXt = b{Xt) dt + a{Xt) dWt + v{Xt) dLflX), 

( 2 . 1 ) 

Xq = xq and Xt G [0,1] V t > 0. 

The process (IW, t > 0) is a standard Brownian motion and {Lt{X),t > 0) is 
a nonanticipative continuous nondecreasing process that increases only when 
Xt G {0,1}. The boundedness of b{-) and the ellipticity of cj(-) ensure the 
existence of a weak solution; see for instance Stroock and Varadhan (1971). 
Note that the process L{X) is part of the solution and is given by a difference 
of local times of X at the boundary points of [0,1]. 

Table 1 



Parametric 

Nonparametric 

b 

(T 

b 

(T 

Continuous 

rp-l/2 

known 

rj-i—s/(2s-\-l) 

known 

HF data 


^-1/2 


jy-»/(2s+l) 

LF data 

;V-i/2 

^r-1/2 

UN 

Vn 
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Due to the compactness of [0,1] and the reflecting boundary conditions, 
the Markov process X has a spectral gap, which implies geometric ergodicity; 
compare with Lemmas 6.1 and 6.2. In particular, a unique invariant measure 
/i exists and the one-dimensional distributions of Xt converge exponentially 
fast to /i as t —> oo so that the assumption of stationarity can be made 
without loss of generality for asymptotic results. 

We denote by the law of the associated stationary diffusion on the 
canonical space D = C(M+, [0,1]) of continuous functions over the positive 
axis with values in [0,1], equipped with the topology of uniform convergence 
and endowed with its Borel cr-field T. We denote by E(j,fe the corresponding 
expectation operator. Given > 1 and A > 0, we observe the canonical 
process (W, t > 0) at equidistant times nA for n = 0,1,..., A. Let denote 
the cr-field generated by {A^aI^ = 0,... ,N}. 

Definition 2.2. An estimator of the pair (cr(-), 6(-)) is an AAr-measurable 
function on D with values in L^([0,1]) x L^([0,1]). 

To assess the L^-risk in a minimax framework, we introduce the nonpara- 
metric set ©s, which consists of pairs of functions of regularity s and s — 1, 
respectively. 

Definition 2.3. For s > 1 and given constants C > c > 0, we consider 
the class ©^ := Q{s,C,c) defined by 

|(cj,6) EiL^([0,1]) X iL^”^([0,l])|||cj||//» <C, \\b\\Hs-i <C, inf(j(x) >c|, 
where denotes the L^-Sobolev space of order s. 

Note that all (cr(-),6(-)) E ©* satisfy Assumption 2.1. 

2.2. Minimax rates of convergence. We are now in position to state the 
main theorems. By (3.12) and (3.13) in the next section we define estima¬ 
tors and b using a spectral estimation method based on the observation 
(Aq, Aa, ..., Aata)- These estimators, which of course depend on the number 
N of observations, satisfy the following uniform asymptotic upper bounds. 


Theorem 2.4. For all s> I, C >c> 0 and 0 < a < b < I we have 
sup <^-./(2.+3)^ 

((T,6)e©s 

sup 

((T,b)e©s 
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Recall that A<B means that A can be bounded by a constant multiple of 
B, where the constant depends continuously on other parameters involved. 
Similarly, A> B is equivalent to B < A and A~ B holds if both relations 
A<B and A>B are true. 

As the following lower bounds prove, the rates of convergence of our esti¬ 
mators are optimal in a minimax sense over 0^. 


Theorem 2.5. Let denote the set of all estimators according to 
Definition 2.2. Then for all 0 <a <b<l and s > 1 the following hold: 


( 2 . 2 ) 


inf sup 
<5-2eEAr (o-,fe)60s 


a 


2\\2 11/2 ^ j.j—s/(2s+3) 


(2.3) 


_inf sup E.,fe[||6 - > iV-(-i)/(2^+3). 

b&Eff (c.,fe)ee. 


If we set Si = s — 1 and S 2 = s, then the drift b(-) G 0<j with regularity si 
can be estimated with the minimax rate of convergence un = 
whereas the diffusion coefficient fj(-) G 0^ has regularity S 2 and the corre¬ 
sponding minimax rate of convergence is vn = A^“*2/(2s 2+3)^ Hence, Table 
1 in Section 1.2.5 can be filled with two rather unexpected rates of conver¬ 
gence UN and vn- In Section 3.3 the rates are explained in the terminology of 
ill-posed problems and reasons are given why the tight connection between 
the regularity assumptions on b{-) and cj(-) is needed. 


3. Spectral estimation method. 


3.1. The basic idea. We shall base our estimator of the diffusion coef¬ 
ficient a{-) and of the drift coefficient 6(-) on spectral methods for passing 
from the transition operator Pa, which is approximately known to us, to the 
infinitesimal generator L, which more explicitly encodes the functions (t(-) 
and b{-). In the sequel, we shall rely on classical results for scalar diffusions; 
for example, consult Bass [(1998), Chapter 4]. We use the specihc form of 
the invariant density 

(3.1) /r(x) =2C'o<T“^(x)exp^^ 2b{y) {y) dy 

and the function S'(-) = l/s'(-), derived from the scale function s(-), 

(3.2) S{x) = ^a^{x)y~^{x)=Coexp(^-2b{y)cr~^(y) dy'^ , 

with the normalizing constant Cq > 0 depending on (t(-) and b(-). The action 
of the generator in divergence form is given by 

(3.3) Lf{x) = T^,b/(x) = l-a^{x)f{x) + b{x)f{x) = -^(5(x)/'(x))', 

2 fi{x) 
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where the domain of this unbounded operator on L^(/i) is given by the 
subspace of the L^-Sobolev space with Neumann boundary conditions 


dom(L) = {/ e H\[0, l])|/'(0) = /'(I) = 0}. 


The generator L is a self-adjoint elliptic operator on T^(/-i) with compact 
resolvent so that it has nonpositive point spectrum only. If ui denotes the 
largest negative eigenvalue of L with eigenfunction ui, then due to the re¬ 
flecting boundary of [0,1] the Neumann boundary conditions = Ui(l) = 
0 hold and thus 

px 

(3.4) Lui = n~^{Su[y = i^iui S{x)u[{x) = 1^1 ui{y)fi{y) dy. 

JO 

From (3.2) we can derive an explicit expression for the diffusion coefficient: 

ui{y)ii{y)dy 
u\{x)ix{x) 

The corresponding expression for the drift coefficient is 

ui {x)u'^ {x)y{x) - u'l{x) Jq ui {y)niy) dy 
u'^{xYy{x) 

Hence, if we knew the invariant measure /r, the eigenvalue ui and the eigen¬ 
function ui (including its first two derivatives), we could exactly determine 
the drift and diffusion coefficient. Of course, these identities are valid for 
any eigenfunction Uk with eigenvalue i^k, but for better numerical stability 
we shall use only the largest nondegenerate eigenvalue vi. Moreover, it is 
known that only the eigenfunction ui does not have a vanishing derivative 
in the interior of the interval (cf. Proposition 6.5) so that by this choice 
indeterminacy at interior points is avoided. 

Using semigroup theory [Engel and Nagel (2000), Theorem IV.3.7] we 
know that ui is also an eigenfunction of Pa with eigenvalue ki = . 

Our procedure consists of determining estimators fi of y and Pa of Pa, to 
calculate the corresponding eigenpair {ki,ui) and to use (3.5) and (3.6) to 
build a plug-in estimator of (t(-) and b{-). 




3.2. Construction of the estimators. We use projection methods, tak¬ 
ing advantage of approximating properties of abstract operators by finite¬ 
dimensional matrices, for which the spectrum is easy to calculate numeri¬ 
cally. A similar approach was already suggested by Chen, Hansen and Scheinkman 
(1997). More specifically, we make use of wavelets on the interval [0,1]. For 
the construction of wavelet bases and their properties we refer to Cohen 
( 2000 ). 
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Definition 3.1. Let (ipx) with multiindices A = {j,k) be a compactly 
supported L^-orthonormal wavelet basis of L^([0,1]). The approximation 
spaces (Vj) are dehned as L^-closed linear spans of the wavelets up to the 
frequency level J, 

Vj :=span{i/)A||A| < J} where |(j,fc)| := j. 

The L^-orthogonal projection onto Vj is called ttj; the L^(/z)-orthogonal 
projection onto Vj is called ttj. 

In the sequel we shall regularly use the Jackson and Bernstein inequalities 
with respect to the L^-Sobolev spaces 1]) of regularity s: 

||(Id-7rj)/||^t < 0<t<s, 

VujeVj, 0<t<s. 

The canonical projection estimate of fi based on {XnA)o<n<N is given by 

1 ^ 

(3.7) fi := with/I a := wv—Xl^A(^nA)- 

|A|<J ^''+^n=0 

By the ergodicity of X it follows that fix is a consistent estimate of {fi,'tpx) 
for —> oo. To estimate the action of the transition operator on the wavelet 
basis (P^)a,a' (PaV’a, V’A')/i) we introduce the symmetrized matrix esti¬ 
mator Pa with entries 

1 ^ 

(3.8) (Pa)a,A' •= ^ X] (V'A(-^(n-l)A)V’A'(-^nA) +'0A'(^(n-l)A)V'A(-^nA))- 

n=l 

This yields an approximation of {PAifx' jifx)fn that is, of the action of the 
transition operator on Vj with respect to the unknown scalar product (•, •)^ 
in We therefore introduce a third statistic G, which approximates 

the dim(V’j) x dim(Vj)-dimensional Gram matrix G with entries Ga,a' = 
iij and which is given by 

Ga,a' 

(3.9) 

The particular treatment of the boundary terms will be explained later. If 
we put H = {wn'ifx{Xn))\x\<j,n<N with wq = wn = ^ and Wn = l otherwise, 
we have G = 5]^ being the transpose of 51. Our construction can 


2 


ifx{Xo)iPx'{Xo) 


N-l 


+ ■k'4>\{Xna)'4>X'{Xna) + X '^\i^nA)i>X'{XnA) ■ 


n=l 
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thus be regarded as a least squares type estimator, as in a usual regression 
setting; see the argument developed in Section 3.3.1. 

We combine the last two estimators in order to determine estimates for 
the eigenvalue ni and the eigenfunction ui of Pa- As will be made precise 
in Proposition 4.5, the operators Pa and t^jPa are close for large values 
of J. Note that all eigenvectors of ttjPa lie in Vj, the range of ttjPa- The 
eigenfunction u{ corresponding to the second largest eigenvalue nj of vtjPa 
is characterized by 


(3.10) 


{PAui,'ip\)f, = K({ui,ipx)ti V|A|<J. 


We pass to vector-matrix notation and use from now on bold letters to 
define for a function v €Vj the corresponding coefficient column vector v = 
Observe carefully the different P^-scalar products used; here 
they are with respect to the Lebesgue measure. Thus, we can rewrite (3.10) 
as 


(3.11) Piu( = K(Gu-[. 

As v^Gv = (u,u)/, > 0 holds for u G Vj \ {0}, the matrix G is invertible and 
{Kf,uf) is an eigenpair of G“^Pa- This matrix is self-adjoint with respect 
to the scalar product induced by G: 

(G~ipiv,w)G := (G-^Piv)^Gw 

= v^Piw = (v,G-ipiw)G. 

Similarly, v^Gv = A^“^(5]^v)^S^v > 0 holds and the matrix G can 
be shown to be even strictly positive definite with high probability (see 
Lemma 4.12). In this case, we similarly infer that G“^P is self-adjoint with 
respect to the G-scalar product. The Cauchy-Schwarz inequality and the 
inequality between geometric and arithmetic mean yield the estimate 

. . 1 ^ 

(G ^Pav,v)^ = —^u(A(„_i)a)v(A:„a) 

n=l 

/N-l \ 1/2 / AT \ 1/2 

\ (\ 1 \ 

“ ~N ( ^ j 

\ n=l / 

= (v,v)g- 

We infer that all eigenvalues of G“^Pa are real and not larger than 1. Hence, 
the second largest eigenvalue ki of G~^Pa is well defined, which is why we 
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downweighted the boundary terms of G. The eigenvalue ki of G"^Pa and 
its corresponding eigenvector ui yield estimators of and u'(. 

Plugging the estimator fi as well as ki and ui into (3.5) and (3.6), we 
obtain our estimators of o'^(-) and b{-): 


(3.12) 

(3.13) 


, 2 ( N _ 2A ^logjki) J^ui{y)fi{y)dy 

b{x) ■= - u^ljx) Jq ui{y)fl{y)dy 


To avoid indeterminacy, the denominators of the estimators are forced to 
remain above a certain minimal level, which depends on the subinterval 
[o, b] C [0,1] for which the loss function is taken. See (4.5) for the exact 
formulation in the case of d'^(-) and proceed analogously for 6(-). 


3.3. Discussion. 


3.3.1. Least squares approach. The estimator matrix G“^Pa is built as 
in the least squares approach for projection methods in classical regression. 
To estimate Pa'4^\q{x) = Eo-,ft[V’Ao(-^A)|^o = 3;], the least squares method 
consists of minimizing 


(3.14) 


N 

E 

n=l 


|A|<J 


mm! 


over all real coefficients (ct^), leading to the normal equations 


N / \ N 

E E «AV'A(-’^(n-l)A) |'0 A'(-T(„_i)a) = ^A'(^(n-l)A)^Ao (^nA) 

n=l V |A|<J / n=l 

for all I A'I < J. Up to the special treatment of the boundary terms, we thus 
obtain the vector (a^) as the column with index Aq in G“^Pa- 


3.3.2. Other than wavelet methods. For our projection estimates to work, 
we merely need approximation spaces satisfying the Jackson and Bernstein 
inequalities. Hence, other finite element bases could serve as well. 

The invariant density and the transition density could also be estimated 
using kernel methods, but the numerical calculation of the eigenpair (Ki,{ti) 
would then involve an additional discretization step. 

3.3.3. Diffusions over the real line. Using a wavelet basis of L^(M), it is 
still possible to estimate p, and Pa over the real line; in particular the eigen¬ 
value characterization (3.10) extends to this case. Hansen, Scheinkman and Touzi 
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(1998) derive the same formulae as (3.5) and (3.6) under ergodicity and 
boundary conditions so that a plug-in approach is feasible. However, a theo¬ 
retical study seems to require much more demanding theoretical tools. If the 
uniform separation of the spectral value vi and a polynomial growth bound 
for the eigenfunction ui(-) are ensured, we expect that the same minimax re¬ 
sults hold with respect to an L^(//)-loss function, where the invariant density 
/i(-) is of course parameter-dependent. However, all spectral approximation 
results have to be reconsidered with extra care, in particular because the 
L^(/i)-norms are in general not equivalent for different parameters. 

3.3.4. Adaptation to unknown smoothness. The knowledge of the smooth¬ 

ness s that is needed for the construction of our estimators is not realistic 
in practice. An adaptive estimation of the eigenpair (lii (•),«!) and /r(-) that 
yields adaptive estimators for could be obtained by the follow¬ 

ing modifications: First, the adaptive estimation of p,{-) in a classical mixing 
framework is fairly well known [e.g., Tribouley and Viennet (1998)]. Second, 
taking advantage of the multiresolution structure provided by wavelets, the 
adaptive estimation of Pa could be obtained by introducing an appropriate 
thresholding in the estimated matrices on a large approximation space. 

3.3.5. Interpretation as an ill-posed problem. One can make the link with 

ill-posed inverse problems by saying that estimation of //(•) is well-posed 
(i.e., with achievable rate for S{-) we need an estimate 

of the derivative tt)^(-) yielding an ill-posedness degree of 1 (A^“^/0^+^)). 
Observe that the regularity conditions a G and b G are translated 

into fj. G S G H^. The transformation of {p,S) to (7^(-) = 2S{-)/p,{-) is 
stable [T^-continuous for S{-) > sq > 0], whereas in 6(-) = S'{-)/ another 
ill-posed operation (differentiation) occurs with degree 1. 

A brief stepwise explanation reads as follows. Step 1, the natural parametriza- 
tion (/i, Pa) is well-posed (for Pa in the strong operator norm sense). Step 2, 
the calculation of the spectral pair is well-posed. Step 3, the differ¬ 

entiation of tti that determines S has an ill-posedness of degree 1. Step 4, 
the calculation of from (/r, S) is well-posed. Step 5, the calculation of b 
from {fa,S) is ill- posed of degree 1. 

3.3.6. Regularity restrictions on b{-) and <t(-). It is noteworthy that in 

the continuous time or high frequency observation case, the parameter b{-) 
does not influence the asymptotic behavior of the estimator of a{-) and vice 
versa. The estimation problems are separated. In our low frequency regime 
we had to suppose tight regularity connections between a{-) and 6(-). This 
stems from the fact that for the underlying Markov chain the pa¬ 

rameters /i(-) and S'(-) are more natural and the regularity of these functions 
depends on the regularity both of b{-) and of (7(-). 
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At a different level, in nonparametric regression, different smoothness 
constraints are needed between the mean and the variance function. Rec¬ 
ommended references are Muller and Stadtmiiller (1987) and Fan and Yao 
(1998). 

Finally, although we ask for the tight connection si = S 2 — 1 for the reg¬ 
ularity Si of the drift b{-) and S 2 of the diffusion coefficient it(-), our results 
readily carry over to the milder constraint si > S 2 — 1. 


3.3.7. Estimation when one parameter is known. If cj(-) is known, an 
estimate fi of the invariant density yields an estimate of 6(-), since 


b{x) 


{a^{x)p{x)y 

2/i(x) 


xE [0,1]. 


Estimation of /x E , s > 1, in F7^-norm can be achieved with rate 

and this rate is thus also valid for estimating b{-) in L^-norm. Given the drift 

coefficient 6(-), we hnd 


_ o/o" Ky)h{y) dy + C 

G [X) — L , 


X E [0,1], 


where C is a suitable constant. If we knew cr^(O), we would obtain the rate 
jY-Y(2s+i) for 

Using a preliminary nonparametric estimate Oq depending on the param¬ 
eter C and then fitting a parametric model for C, we are likely to find the 
same rate. In any case, the assumption of knowing one parameter seems 
rather artificial and no further investigations have been performed. 


3.3.8. Estimation at the boundary. Our plug-in estimators can only be 
defined on the open subinterval (0,1). Estimation at the boundary points 
leads to a risk increase due to S'“^(0) = iX]^ui(0)/x(0)/n'/(0) by de rHospital’s 
rule applied to (3.4). Thus, estimating cj(0) and 6(0) involves taking the 
second and third derivative, respectively, when using plug-in estimators. A 
pointwise lower bound result—along the lines of the L^-lower bound proof— 
shows that this deterioration cannot be avoided. 


4. Proof of the upper bound. 

4.1. Convergence of jl. First, we recall the proof for the risk bound in 
estimating the invariant measure: 

Proposition 4.1. With the choice 2^ ~ the following uniform 

risk estimate holds for ft based on N observations: 

sup 

(cr,6)e0s 
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Proof. The explicit formula (3.1) for /r shows that ||/i||H» is uniformly 
bounded over 0^. This implies that the bias term satisfies 

uniformly over 0^. Since fix is an unbiased estimator of {fi, ipx), we can apply 
the variance estimates of Lemma 6.2 to obtain 

- TTjfiWl,] = E Var,.&[MA] < 2-^N-\ 

\\\<j 

which—in combination with the uniformity of the constants involved—gives 
the announced upper bound. □ 

4.2. Spectral approximation. We shall rely on the spectral approximation 
results given in Chatelin (1983); compare also Kato (1995). Since for (t(-) 
we have to estimate not only the eigenvalue ui, but also its derivative u'^., 
we will be working in the L^-Sobolev space . The general idea is that 
the error in the eigenvalue and in the eigenfunction can be controlled by the 
error of the operator on the eigenspace, once the overall error measured in 
the operator norm is small. Let R{T,z) = (T — zld)“^ denote the resolvent 
map of the operator T, (t{T) its spectrum and B{x,r) the closed ball of 
radius r around x. 


Proposition 4.2. Suppose a bounded linear operator T on a Hilbert 
space has a simple eigenvalue k such that (j{T) n B(k,p) = {k} holds for 
some p > 0. Let be a second linear operator with HTg — T|| < where 

R := (sup^g 5 (^ p) ll-^(^) -2^)11)”^ • Then the operator has a simple eigenvalue 
Ks in B{k, p) and there are eigenvectors u and with Tu = ku, 
satisfying 

(4.1) \\u,-u\\<V8R-^\\{Te-T)u\\. 


Proof. We use the resolvent identity and the Cauchy integral repre¬ 
sentation of the spectral projection on the eigenspace of Tg contained 
in B{k,p) [see Chatelin (1983), Lemma 6.4]. By the usual Neumann series 
argument we find formally for an eigenvector u corresponding to k, 


\u — PpuW = 


27r 


R{Ts,z) 

B(k,p) K- Z 


dz (Tg — T)u 


<^2ttp sup \\R{Te,z)\\p ^\\{T^-T)u 


27r 


z£B{k,p) 


< _ __IKT 

= {R-\\T,-T\\r^\\{T,-T)u\\. 


T)u 
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Hence, for \\T^ — T\\ < ^ this calculation is a posteriori justified and yields 
||u — Peu\\ < 2R~^\\{Ti: — T)u\\. Applying ||(Te — T)u\\ < ||ri|| once again, 

we see that the projection cannot be zero. Consequently there must be 
a part of the spectrum of in B{k,p). By the argument in the proof of 
Theorem 5.22 in Chatelin (1983) this part consists of a simple eigenvalue 

Ke. 

It remains to find eigenvectors that are close, too. Observe that, for arbi¬ 
trary Hilbert space elements g,h with ||g|| = ||/i|| = 1 and {g,h) > 0, 

\\g - /i|p = 2 - 2 {g,h) < 2(1 + {g,h)){l - {g,h)) = 2\\g - {g,h)h\\‘^ 

holds. We substitute for g and h the normalized eigenvectors u and with 
{u,Ue) > 0; note that oblique projections only enlarge the right-hand side 
and thus infer (4.1). □ 

Corollary 4.3. Under the conditions of Proposition 4.2 there is a con¬ 
stant C = C{R, ||T||) such that \k£ — k\< C\\{Te — T)u\\. 

Proof. The inverse triangle inequality yields 

\k£ - k\ = lllTeUell - ||rn||| < \\T£{ue - u) (Tg - T)u\\ 

<{\\T\\ + \\T£-T\\)\\u£-u\\ + \\{T£-T)n\\ 

< (^(^||r|| + |)^/8p-i + i)||(r,-r)u||, 

where the last line follows from Proposition 4.2. □ 

4.3. Bias estimates. In a first estimation step, we bound the determinis¬ 
tic error due to the finite-dimensional projection ttjPa of Pa- We start with 
a lemma stating that ttj and vrj have similar approximation properties. 

Lemma 4.4. Let m: [0,1] —> [mo, mi] he a measurable function with mi > 
mo > 0. Denote by tt™ the L^{m)-orthogonal projection onto the multireso¬ 
lution space Vj. Then there is a constant C = C'(mo,mi) such that 

||(Id-7r7)/||^i<C||(Id-7rj)/||^i yf€H\[0,l]). 

Proof. The norm equivalence mo||ff||L 2 < H^ljm <rni\\g\\i 2 implies 

On the other hand, the Bernstein inequality in Vj and the Jackson inequality 
for Id —TTj in and yield, for / G 

||(Id-7r[;)/||jii = ||(Id-7r(;)(Id-7rj)/||jii 

< ||(Id-7rj)/||^i -L IItt);(I d- ttj)/11^1 

< ||(Id-7rj)/||^i -L2-^||7r[;(Id-7rj)/||i2 

< ||(Id-7rj)/||^i -L ||7r[;||i2^i2||(Id-7rj)/||^i, 
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where the constants depend only on the approximation spaces. □ 

Proposition 4.5. Uniformly over 0^ we have HtTjPa — ^ 

2-Js^ 


Proof. The transition density pA is the kernel of the operator Pa- 
Hence, from Lemma 6.7 it follows that Pa : is continuous with 

a uniform norm bound over 0^. Lemma 4.4 yields 

||(Pa -vrjPA)/||Hi ^ l|Id-7rj||^s+i^j^i||/||;^i. 

The Jackson inequality in gives the result. □ 

Corollary 4.6. Let k{ be the largest eigenvalue smaller than 1 of ttj 
with eigenfunction Ui. Then uniformly overQg the following estimate holds: 

\Kf - Ki\ + \\uf - uiWhi < 

Proof. We are going to apply Proposition 4.2 on the space and 
its Corollary 4.3. In view of Proposition 4.5, it remains to establish the 
existence of uniformly strictly positive values for p and R over 0^. The 
uniform separation of ni from the rest of the spectrum is the content of 
Proposition 6.5. 

For the choice of p in Proposition 6.5 we search a uniform bound R. If 
we regard Pa on then Pa is self-adjoint and satisfies ||P(Pa,^;)|| = 

dist(2 :,(t(Pa))~^ [see Chatelin (1983), Proposition 2.32]. 

By Lemma 6.3 and the commutativity between Pa and L we conclude 

||P(PA,z)/|bi ~ \m-Lfl^R{PA,z)fh 

<||P(PA,^)||||(Id-P)V2/|U 

~dist(z,o-(PA))"^||/||Hi- 

Hence, \\R{Pa, ^ P~^ holds uniformly over z £ B{k, p) and (cr, 6) G 0*. 

□ 


Remark 4.7. The Kato-Temple inequality [Chatelin (1983), Theorem 6.21] 
on L^{p) even establishes the so-called superconvergence |k/ — ki| < . 

4.4. Variance estimates. To bound the stochastic error on the finite¬ 
dimensional space Vj, we return to vector-matrix notation and look for a 
bound on the error involved in the estimators G and Pa- The Euclidean 
norm is denoted by || - Wp. 
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Lemma 4.8. For any vector v G we have, uniformly over Qg, 

Proof. We obtain, by (6.1) in Lemma 6.2, 

E.,fe[||(G-G)v||2,] 


= E 

|A|<J 


E (E,,6[V^A(Xo)i;(Xo)] - ^MXoHXo) 


. N-l N 

- -V'a(^Aa)^;(^Aa) - E MXnAHXr^A) 

n=l / 


< 


^ 1V-^E<,,6[(V^a(Xo)^(^o))'] 


|A|<J 


<x 


-1 


|A|<J 


blli2|l/^l|oo <iV ^2-^||v||f2, 


as asserted. □ 

Lemma 4.9. For any vector v we have, uniformly over 0^, 
E.,6[||(PA-PA)v||22]<||v||f2iV-i2''. 

Proof. We obtain, by (6.2) in Lemma 6.2, 
E.,fe[||(PA-PA)^^||?2] 


= E 

|A|<J 


N 


N 


E ^A(^(n-l)A)^^(^nA) " E,,,[^a(^o)^^(Xa)] 


n=l 


< 


^ iV-iE,,6[(V’A(^o)^(^i))'] 


|A|<J 

<N-^ E IIV’A|li2||^^||i2||lxpA||oo <iV^2'^||v||22, 

|A|<J 

as asserted. □ 

Definition 4.10. We introduce the random set 

7^ = 7^J,iv:={||G-G|| <i||G-i||-^}. 

Remark 4.11. Since G is invertible, so is G on 7^ with ||G'^ 
2||G~^|| by the usual Neumann series argument. 
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Lemma 4.12. Uniformly over Qg we have <N ^2^'^. 

Proof. By the classical Hilbert-Schmidt norm inequality, 
||G-G||p^; 2< ^ ||(G - G)eA||p^;2 

|A|<J 

holds with unit vectors (e^). Then Lemma 4.8 gives uniformly 

Since the spaces L^([0,1]) and are isomorphic with uniform isomor¬ 

phism constants, ||G~^|| ~ 1 holds uniformly over 0^ and the assertion fol¬ 
lows from Chebyshev’s inequality. □ 

Proposition 4.13. For any e > 0 we have, uniformly over 0^, 

P^,6(7^ n {||G"^Pa - G^^PaII > e}) < N-^2^-^e-^. 

Proof. First, we separate the different error terms: 

G-^Pa - G-^Pa = G“^(Pa - Pa) + (GJ-^ - G-1)Pa 
= G-1((Pa - Pa) + (G - G)G”1Pa). 

On the set TZ we obtain, by Remark 4.11, 

||G"1PA - G-^PaII < ||G"1||(||Pa - PaII + ||G - G||||G-i||PA||) 

< 2||G-i(||PA - PaII + ||G - G||||G-i||PA||) 

< ||Pa-Pa|| + ||G-G||. 

By Lemmas 4.8 and 4.9 and the Hilbert-Schmidt norm estimate (cf. the 
proof of Lemma 4.12) we obtain the uniform norm bound over 0*, 

E.,fe[||G-iPA - G-^PaII" lie] < N-^ 2 ^'^. 

It remains to apply Chebyshev’s inequality. □ 

Having established the weak consistency of the estimators in matrix norm, 
we now bound the error on the eigenspace. 

Proposition 4.14. Let be the vector associated with the normalized 
eigenfunction uf of ttjP^ with eigenvalue Then uniformly over Qg the 
following risk bound holds: 

E.,6[||(G-^Pa - G-^Fi)ui\\lln]<N-^2-^. 
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Proof. By the same separation of the error terms on TZ as in the pre¬ 
ceding proof and by Lemmas 4.8 and 4.9 we find 

lE.,fe[||(G-^PA-G-ipi)uf||2a7e] 

<8||G-i||2(E[||(PA-Pi)uf||2,]+E[||(G-G)AcXllp])<^'"'2-'. 

The uniformity over 0^ follows from the respective statements in the lem¬ 
mas. □ 

Corollary 4.15. Let ki he the second largest eigenvalue of the matrix 
G~^Pa with eigenvector ui. //G is not invertible or i/ ||ui ||;2 > 2sup0^ ll^^i IIl^ 
holds, put ui := 0, ki : = 0. If —> 0 holds, then uniformly over 0^ the 

following bounds hold for A^, J —> oo: 

(4.2) E<,,fe[(|Ki - Acf p + ||ui - ni\\l)ln] < N-^2^, 

(4.3) E^^,[\\ui-ui\\l,]<N-^2^^. 

Proof. For the proof of (4.2) we apply Proposition 4.2 using the Eu¬ 
clidean Hilbert space and Corollary 4.3. Then Proposition 4.14 in con¬ 
nection with Proposition 4.13 (using e < R/2 and N~^2‘^'^ —> 0) yields the 
correct asymptotic rate on the event IZ. For the uniform choice of p and R 
for G“^Pa in Proposition 4.2 just use the corresponding result for Pa and 
the convergence HvtjPa — Pa|| ^ 0. 

The precaution taken for undefined or too large ui is necessary for the 
event Ll\IZ. Since the estimators ki and lii are now kept artificially bounded, 
the rate E(t, 6(H \ P) ^ established in Lemma 4.12 suffices to bound 

the risk on H\P. Hence, the second estimate (4.3) is a consequence of (4.2) 
and the Bernstein inequality ||ui — u/|| HI < 2 ‘^\\ui-u(\\l 2 . □ 

Remark 4.16. The main result of this section, namely (4.3), can be 
extended to pth moments for all p G (1, oo): 

(4.4) E.,fe[||ui - < iV-i/223J/2_ 

Indeed, tracing back the steps, it suffices to obtain bounds on the moments 
of order p in Lemmas 4.8 and 4.9, which on their part rely on the mix¬ 
ing statement in Lemma 6.2. By arguments based on the Riesz convexity 
theorem this last lemma generalizes to the corresponding bounds for pth 
moments, as derived in Section VH.4 of Rosenblatt (1971). For the sake of 
clarity we have restricted ourselves to the case p = 2 here. 
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4.5. Upper bound for cr {-). By Corollary 4.15 and our choice of J, 2“^ ~ 

jSll/{2s+3)^ 

sup - k/|2 + ||«1 - uiWjjr] < N-^2^-^ ~ jv-2./(2s+3) 

(cr,6)60s 

holds. Using this estimate and the estimate for fi in Proposition 4.1, the 
risk of the plug-in estimator (T^(-) in (3.12) is bounded as asserted in the 
theorem. We only have to ensure that the stochastic error does not increase 
from the plug-in and that the denominator is uniformly bounded away from 
zero. Using the Cauchy-Schwarz inequality and Remark 4.16 on the higher 
moments of our estimators, we encounter no problem in the first case. The 
second issue is dealt with by using the lower bound Ca^ > 0 in Proposition 6.5 
so that an improvement of the estimate for the denominator by using 

(4.5) flui := max{jlui,Ca,b) 

instead of fiui guarantees the uniform lower bound away from zero. 

4.6. Upper bound for b{-). Since h{-) = S'{-)/ holds, it suffices to dis¬ 
cuss how to estimate <S'^(-), which amounts to estimating the eigenfunction ui 
in Pf^-norm; compare with (3.6). Substituting for in Proposition 4.5 
and its proof, we obtain the bound 

because || Id — 7rj||j:/s+i^j:^2 is of this order. As in Corollary 4.6 this is also 
the rate for the bias estimate. The only fine point is the uniform norm equiv¬ 
alence 11/11^2 ~ ll(Id —L)/||;i for / e dom(L), which follows by the method¬ 
ology of perturbation and similarity arguments given in Section VI.4b of 
Engel and Nagel (2000). We omit further details. 

The variance estimate is exactly the same. From (4.2) we infer, by Bern¬ 
stein’s inequality for and the estimate of Po-,fe(f^ \ 'P)^ 

Therefore balancing the bias and variance part of the risk by the choice 2“^ ~ 
^i/(2s+3)—g^g i^gfQpg—yields the asserted rate of convergence . 

5. Proof of the lower bounds. First, the usual Bayes prior technique 
is applied for the reduction to a problem of bounding certain likelihood 
ratios. Then the problem is reduced to that of bounding the L^-distance 
between the transition probabilities, which is finally accomplished using 
Hilbert-Schmidt norm estimates and the explicit form of the inverse of the 
generator. 
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5.1. The least favorable prior. The idea is to perturb the drift and diffu¬ 
sion coefficients of the reflected Brownian motion in such a way that the 
invariant measure remains unchanged. Let us assume that if is a. com¬ 
pactly supported wavelet in with one vanishing moment. Without loss 
of generality we suppose C > 1 > c > 0 such that (1,0) G 0^ holds. We 
put ifjk = 2^/^ijj{2^ ■ —k) and denote by Kj C Z a maximal set of indices 
k such that supp{ifjk) C [a,b] and supp{ipjk) n supp('0jfc') = 0 holds for 
all k,k' G Kj, k 7 ^ k'. Furthermore, we set 7 ~ such that, for all 

e = £{j) G {-1,-Ll}l^^l, 

((Se)') G 0^ with Se{x) := Se{j, x) := I 2 -b 7 ^ Skifjkix) 

\ keKj 

We consider the corresponding diffusions with generator 

Lsef{x) ■= {Sef')'{x) := Se{x)f’{x) + S'(x)/'(x), / G dom(L). 

Hence, the invariant measure is the Lebesgue measure on [0,1]. 

The usual Assouad cube techniques [e.g., see Korostelev and Tsybakov 
(1993)] give, for any estimator d(-) and for G N, p > 0, the lower bounds 

(5.1) sup > 2^Sl^e-Ppo, 

(cr,b)e&s 



(5.2) 


sup ^a,b[\\b-b\\h{[a,b])]^‘^^^be ^Po, 

((T,&)e©s 


where, for all e,e' with jje — e'jjp = 2 and with Pg := P 72 S 7 S' > 

5.2 <||2S,.-25,11^2, 5b<||S',-S;||i2, ^ >6“^). 

We choose 5.2 ~ 7 since, for x G supp(?/)jfc) with 


Se'{x) - Se{x) = ±2'^lfjk{x)Se'{x)Se{x) 

and Ss,S^i ^ holds uniformly so that the L^-norm is indeed of order 
7 . Equivalently, we find 6 b ~ 2 -^ 7 . Due to 7 ~ , the proof of the 

theorem is accomplished, once we have shown that a strictly positive po can 
be chosen for fixed p > 0 and the asymptotics 2 ^ rsj jyl/(26+3)_ 


5.2. Reduction to the convergence of transition densities. If we denote 
the transition probability densities Pe(AA G dy\XQ = x) by Pe{x,y)dy and 
the transition density of reflected Brownian motion by PbM; then we infer, 
from Proposition 6.4, 
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due to ||5e — \ \\c^ ~ ^ 0 for s > 1. We are now going to use the 

estimate — log(l + x) <x^ — x, which is valid for all x > —For j so large 
that 111 — ^lloo < 4 holds, the Kullback-Leibler distance can be bounded 
from above (note that the invariant measure is Lebesgue measure): 




-log 


d¥. 


dF^ 


N 


=-i:e= 


k=l 


Tn ' - 


log 


/' /'log 
JO JO 


= -N 


<N 


= N 


Pe' ^k) 

Pe{Xk-l,Xk) 

Pe'{x,y) 


/o Jo 
1 rl 


Pe{x,y) 


Pe{x,y)dydx 


0 Jo 


1 


{Pe'{x,y) -Pe{x,y)f 
Pe{x,y) 

{Pe'{x,y) -Pe{x,y)f 


10 Jo 


- iPe'ix,y) -Pe{x,y))dydx 


dydx 


Pe{x,y) 

< ^^llPr^lloolIPe' -P£lli2([0,l]2)- 

The square root of the Kullback-Leibler distance bounds the total varia¬ 
tion distance in order, which by the Chebyshev inequality yields 




d¥. 




J7v 


> e-M = 1 -: 


> 1 - E, 


dF, 


dF, 

dF^i 


- 1 < - 1 


J7v 


dP, 


- 1 


J^jv 




= l-(l-e ^IKPg/-P£)|: f^||tv 
> 1 - CN^^'^Wps’ -P£||l2([o,i]2), 

where C > 0 is some constant independent of 7 , N, e and j. Summarizing, 
we need the estimate 

(5.3) limsupiV^/^lIpe/ -Pe||L 2 ([o^i] 2 ) <C~^ for 2^ ~ArV( 2 s+ 3 )^ 

N,j^cxi 


5.3. Convergence of the transition densities. Observe first that \\p^' — 
Pe||i 2 (jo,i] 2 ) is exactly the Hilbert-Schmidt norm distance ||P^ “-PaIIhs be¬ 
tween the transition operators derived from and acting on the 

Hilbert space L^([0,1]). If we introduce 


V:=\feL‘ 


'([o,i])| J f = o] 


and V~^ := {/ G L^([0,1])|/ constant}. 
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then the transition operators coincide on V'^ and leave the space V invariant 
so that ||Pi' -Pi||HS = IK^l' -/^DIuIIhs- 

We take advantage of the key result that for Lipschitz functions / with 
Lipschitz constant A on the union of the spectra of two self-adjoint bounded 
operators Ti and T 2 the continuous functional calculus satisfies 

(5.4) ||/(ri) - /(r2)||HS < A||Ti - TsIIhs; 

see Kittaneh (1985). We proceed by bounding the Hilbert-Schmidt norm 
of the difference of the inverses of the generators and by then transferring 
this bound to the transition operators via (5.4). By the functional calcu¬ 
lus for operators on V, the function f{z) = exp(A(z“^)) sends {Lc\v)~^ 
to P^\v- Moreover, / is uniformly Lipschitz continuous on (—oo,0) due to 
A:=sup^<ol/'(^)l = 4 A-ie- < 00 . Thus, we arrive at 

\\Pe' -P£||l 2([0,1]2) = ll(.fA - .PDIvIIhS ^ A||(L£/|y)"^ - (Le|y)"^ ||hS. 
The inverse of the generator on V has for g the explicit form 

(5.5) {L^\v)~^g{x) = (^J S~^{v){v-l[^^i]{v))dv^g{y)dy. 

Using = 27'0jfc for some k e Kj and denoting by 'k the primitive 

of '0 with compact support, we obtain 

||(L,,|y)-'-(L,|y)-i|| 2 g 


( [ ‘^'l'^jk{v)iv - dv) dxdy 

0 JO \Jy J 


= 47^2 ^ { 2 ^y - k)y 


— J ^{2^v — k)dv+ ^^{2^{xV y) — k)j dxdy 
<722-^||T(2A)|||,~722-2i. 

Consequently, \\p^/ ~Pe ||^2 ~ 2“A2s+3) j^glds with an arbitrarily small con¬ 
stant if 72 A*+i/ 2 ) is chosen sufficiently small. Hence, the estimate (5.3) is 
valid for this choice and the asymptotics W2“A2^f+3) ^ which remained 
to be proved. 


6. Technical results. We shall need several technical results, mainly to 
describe the dependence of certain quantities on the underlying diffusion 
parameters. The following result is in close analogy with Section IV.5 in 
Bass (1998). 
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Lemma 6.1. The second largest eigenvalue I'l of the infinitesimal gen¬ 
erator can be bounded away from zero: 

< — inf Six) =: —sq. 

a:e[0,l] 

This eigenvalue is simple and the corresponding eigenfunction fi is mono¬ 
tone. 


Proof. The variational characterization of [Davies (1995), Section 4.5] 
and partial integration yield 

ni= sup (L/,/)^ = - inf f S(x) f (x)'^ dx. 

||/|Im=i 11^11'^=^ -^0 

(/,1)^=0 (/.i>,.=o 

Given the derivative f, the function / G dom(L) with (/, 1)^ = 0 is uniquely 
determined. Setting M{x) :=/i([0,x]), this function / satisfies 

0 = ^ (^/(0) + ^ f\y)dy^ H{x)dx = f{0)+f {y){l - M{y)) dy. 

For f,ge with (/, 1)^ = {g, 1)^ = 0 we find 

{f,9)u= g'{z)dz^y{x)dx 

= /(O)sr(O) -/(O) [ g'iz)il- M{z))dz 
Jo 

-9(0) f fiy)il-M{y))dy 
Jo 

+ [ [ f'{y)9{z){l-M{yyz))dydz 
Jo Jo 

= f f (Miy Az) - M{y)M{z))f{y)g{z)dydz 
Jo Jo 

=: [ [ m{y,z)f'{y)g'{z)dydz. 

Jo Jo 

The kernel ?Ti(y, z) is positive on (0,1)^ and bounded by 1, whence we obtain, 
by regarding u = /', 

“ lo foMy,^)u{y)u{z)dydz \\u\\li 

If the derivative of an eigenfunction fi changed sign, we could write /( = 
u~^ — u~ with two nonnegative functions , u~ that are nontrivial. How¬ 
ever, this would entail that the antiderivative /o of /q := -|- u~ satisfies 
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(-^/o,/o)m = {LfiJi}/,, while ||/o||;, would be strictly greater than ||/i||^ 
due to the positivity of m{u,v). This contradicts the variational character¬ 
ization of I'l so that all eigenfunctions corresponding to are monotone. 
Consequently, for any two eigenfunctions fi and gi the integrand in 

{fl,gl)^l= [ [ rn{y,z)f[{y)g[{z)dydz 
Jo Jo 

does not change sign and the whole integral does not vanish. We infer that 
the eigenspace of i^i cannot contain two orthogonal functions and is thus 
one-dimensional. □ 


Lemma 6.2. For Hi, H 2 G L^([0,1]) we have the following two uniform 
variance estimates over 0o.' 


( 6 . 1 ) 


( 6 . 2 ) Var,,b 


Var, 


cr,6 


1 


N 


J^T.W{XnA] 


n=l 


N 


N 




n=l 


<N-^E^,b[Hi{Xo)% 

<N-^E,^b[Hi{XofH2{Xif]. 


Proof. Due to the uniform spectral gap sq over 0* (Proposition 6.1), 
Pa satisfies ||PA/||/i < 7 II/IU with 7 := < 1 for all / G L'^{n) with 

(/,i)m = o. 

We obtain the first estimate by considering the centered random variables 
hiXkA) ■■= HiiXkA) - E,,6[P'i(XfcA)], A: G N: 


Var (^7 


r N 


E^i(^-a) 

.n=l 


N 

= Y. E„,b[fliXmA)fliXnA)] 

m,n=l 


N 

E 

m,n=l 




N 

< E 7'”^-"'||/i||^<VE.7[^i(Vo)2]. 

m,n=l 


The second estimate follows along the same lines. Merely observe that for 
m > n, by the projection property of conditional expectations 


[Pi (X(„_i)a )^2 {XnA)Hi (X(^_ 1 )A)P 2 {XmA )] 
= {Hi • {PAH2),Pr^-\Hi • (PaP2)))^ 
holds, where is the usual multiplication operator. □ 


Lemma 6.3. Uniformly over 0^ the following norm equivalence holds: 
||/||Hi~||(Id-L)i/VlU forallfeHK 
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Proof. The invariant measure /i and the function S are uniformly 
bounded away from zero and infinity so that we obtain, with uniform con¬ 
stants for / G dom(L), 

11 / 11^1 = +(- 5 /',/') 

= ((Id-L)/,/)^ = ||(Id-L)V2/||2, 

By an approximation argument this extends to all / G = dom(L^/^). □ 

Proposition 6.4. Suppose (((T„(-), 6„(-)) G0s, n > 0, and 
lim IlfTn - CTolloo = 0, lim ||6„ - 6o||oo = 0. 

n—^■oo n—^■oo 

Then the corresponding transition probabilities {p^'^) converge uniformly: 

Proof. An application of the results by Stroock and Varadhan (1971) 
yields that the corresponding diffusion processes converge weakly to 
for any fixed initial value = X. This implies in particular 

lim [ p['^\x,y)ip{y)dy= f pf\x,y)(p{y) dy 

Jo 

for all test functions p G L“([0,1]) and all x G [0,1]. 

On the other hand, the functions {pt^^)n form a relatively compact subset 
of C([0,1]^) by Proposition 6.7 and Sobolev embeddings. Any point of accu¬ 
mulation of {pt^'^)n in C([0,1]^) must equal pf‘\ which follows from testing 
with suitable functions ip G L°°([0,1]). Consequently, {p[^'^)n is a relatively 
compact sequence with only one point of accumulation and thus converges. 
□ 


Proposition 6.5. For the class 0^ there is a constant p>0 such that 
for all parameters {a{-),b{-)) the eigenvalue ki = Ki{a,b) of Pa is uniformly 
separated: o'(Pa) T\ B{ki,2p) = {/ti}. 

Furthermore, for all 0 < a < b < 1 there is a uniform constant Ca^ > 0 
such that the associated first eigenfunction ui = ui{a,b) satisfies, for all 
{a,b) G 0s, 

min \u[{x)\ > Ca,b- 
xe[a,b] 


Proof. Proceeding indirectly, assume that there is a sequence (cj, 

(n) 

0s such that the corresponding eigenvalues satisfy k) ' 


1 ! (’T-) ("') 

1 (or k\ — K 2 


1 bn) £ 


0 , 
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resp.). By the compactness of the Sobolev embedding of into we can 
pass to a uniformly converging subsequence. Hence, Proposition 6.4 yields 
that the corresponding transition densities converge uniformly, which implies 
that the transition operators converge in operator norm on L^([0,1]). 
By Proposition 5.6 and Theorem 5.20 in Chatelin (1983), this entails the 
convergence of their eigenvalues with preservation of the multiplicities. Since 
the limiting operator is again associated with an elliptic reflected diffusion, 
the fact that the eigenvalue ki = is always simple (Lemma 6.1) gives 
the contradiction. 

(n) 

By the same indirect arguments, we construct transition operators 

on the space C([0,1]) and infer that the eigenfunctions Ui^ [Chatelin (1983), 
Theorem 5.10], the invariant measures [see (3.1)] and the inverses of the 
functions 5^”^ [see (3.2)] converge in supremum norm. Therefore = 

j^("-)(^(n))-i j also converges in supremum norm. Due to u)^|[a,6] 7^ 0 

(Lemma 6.1) this implies that cannot converge to zero on [a,b]. □ 

Lemma 6.6. The L‘^{^)-normalized eigenfunetion Uk of the generator L 
corresponding to the {k + l)st largest eigenvalue I'k satisfies 

||'Wfc||H-+i < C’('S,so, ||5'“^||s, ||/r||s_i)|z^fc|^^\ 
where C is a continuous function of its arguments. 

Proof. We know that pL~^{Su'^y = UkUk and ^^(0) = 0 holds, which 
imply 

PX 

u'kix) = i'kS~^{x) / Uk{u)fj.{u)du. 

Jo 

Suppose Uk € with r £ [0, s]. Then the function UkU is in due 

to Uk £ (Sobolev embeddings). Hence, the antiderivative is in 
As S~^ £ holds, the right-hand side is an element of "We con¬ 

clude that the regularity r of Uk is larger by 1, which implies that Uk is in 

In quantitative terms we obtain for r £ [1, s], where we use the seminorm 
\f\s:=\\f^^'>\\LP 

\Uk\r < \uk\C{r)(^\S~^\r 

< Wk\C{r)\\S~^\\r{\\uk\\Li(f,) + \Uktl\r-l) 

< |i/fc|C'(r)||S'"^mi -L \uk\r-i\\h\\oo IIIIoo IIM||r—1) 

< |i/fc|C'(r)||S’“^mi -L 2||ufc||r._i||//||^_i). 
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By applying this estimate \\uk\\r+i < kfc|||-S' ^||s(l +||ufc||r-i||/^|U-i) succes¬ 
sively for r = 1,2 ,..., [sj and finally, for r = s — 1, the estimate follows. 
□ 


Proposition 6.7. For {cr,b) £ 0^ the corresponding transition probabil¬ 
ity density pA = PA,c 7 ,b satisfies 

sup \\pa\\h-+^xH‘< oo. 

(cr,6)e0s 


Proof. The spectral decomposition of Pa yields 

OO 

PA{x,y) = p{y)^e''>^^uk{x)uk{y), x,ye [0,1]. 

k=0 

Due to the uniform ellipticity and uniform boundedness of the coefficients, 
we have Uk £ [— —C 2 k'^] with uniform constants Ci,C 2 > 0 on 0^ [see 
Davies (1995), adapting Example 4.6.1, page 93, to our situation]. From the 
preceding Lemma 6.6 and the Sobolev embedding C we infer 

CXD 

\\PA\\H‘+ixH‘ \\Uk\\s+l\\pUk\\s 

k=0 

OO 

<Y,C{s,soA\S-%A\p\\s-ife-^^^^\Ciky+^\\y\U, 

k=0 

which gives the desired uniform estimate. □ 
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